Speculative motion prediction cache

ABSTRACT

A method and apparatus to improve motion prediction in video processing systems is introduced. When a motion prediction cache completes requesting data for a current macroblock and enters an into idle state, data comprising one or more reference frames is speculatively requested, with the hope that the requested data are will be needed in a subsequent macroblock. If the speculative data is needed, then it is consumed. However, if the speculative data is not needed, then the correct data must be requested and a price is paid for an extra memory read bandwidth. In case the speculative data is the correct data for the subsequent macroblock, the effective memory read latency is reduced and the decode performance increases. The video decoder becomes more immune to memory read latency.

FIELD OF INVENTION

This invention is generally related to video data caches.

BACKGROUND

Contemporary video compression algorithms require significant memorybandwidth for referencing previously decoded images. A decoder memorybuffer is used to maintain a number of previously decoded image frames,termed reference frames, ready for display so these frames may be usedas references in decoding other image frames. Due to the development andavailability of high definition video, the rate at which the data in thedecoder memory buffers are transferred has substantially increased. Inaddition, the decoder memory buffer may provide data blocks that aresubstantially larger than that required by the decoder to process aparticular image block, thereby increasing the memory bandwidth withoutbenefit.

Motion prediction is a commonly used technique for encoding videoimages. According to conventional video encoding techniques employingmotion prediction, successive images are compared and the motion of aparticular area in one image relative to another image is determined togenerate motion vectors. A “macroblock” is a term used in videocompression for such an area; typically a macroblock represents a blockof 16×16 pixels. Different picture formats utilize different numbers ofpixels and macroblocks. For example, a 1920×1088 HDTV pixel formatincludes 120×68 macroblocks. To decode a video bitstream, a decodershifts blocks in a previous picture according to the respective motionvectors to generate the next image. This process is based on the use ofintracoded frames, forward predicted frames and bi-directional codedframes as is known in the art.

In some video decoder systems, motion prediction (MP) caches are used tolimit the data transfer rate from the memory buffer. An MP cache storesimage pixel values for previously decoded macroblocks that may be usefulfor subsequent macroblocks to be decoded. An MP cache is typicallylimited in capacity and expensive in comparison to an external memorybuffer. An MP cache typically includes only a small portion of the pixeldata necessary for a single video frame. Consequently, data in an MPcache is quickly replaced as new macroblocks or parts of macroblocks arewritten to the cache.

In video decoders, for every macroblock, a set of motion vectors aredecoded from a video bitstream and translated into addresses of pixelsin the reference frame memory buffers. The pixels are then requestedfrom the memory when they are needed, and are expected to return withina macroblock time period. The time elapsed between the request and thereturn of the imaging information is called the latency. The latency ofa memory system from which the pixels are requested can often be quitehigh. In such a case the reference image data return slows down thevideo decoding process, which may cause some frames to be unable to becompleted on time for display and, as a result, the frames are dropped.This can lead to a choppy playback. The motion prediction operation iswell known to be a major source of the memory read latency bottleneck ofa decoder memory system. Accordingly, improvements in memory use andreducing the bottleneck of the system are desired.

SUMMARY

A method and apparatus to improve motion prediction in video processingsystems is introduced. When a motion prediction cache completesrequesting data for a current macroblock and would typically enter intoan idle state, data comprising one or more reference frames isspeculatively requested, with the hope that the requested data will beneeded in a subsequent macroblock. If the speculative data is needed,then it is immediately consumed. However, if the speculative data is notneeded, then the correct data must be requested and a price is paid foran extra memory read bandwidth. In case the speculative data is thecorrect data for the subsequent macroblock, the effective memory readlatency is reduced and the decode performance increases. Thus, the videodecoder becomes more immune to memory read latency.

BRIEF DESCRIPTION OF THE DRAWINGS

A more detailed understanding may be had from the following description,given by way of example and to be understood in conjunction with theaccompanying drawings wherein:

FIG. 1 is block diagram of a system for implementing a speculativemotion prediction cache (MPC);

FIG. 2 is a block diagram of the speculative motion vector block of FIG.1;

FIG. 3 is a more detailed block diagram of the MPC block of FIG. 1;

FIG. 4 is a process of reducing the memory read latency of a motionprediction system;

FIG. 5 is diagram of a memory cache of an MPC using direct mapping; and

FIG. 6 is a diagram of a cache tag identifying a speculatively taggeddata.

DETAILED DESCRIPTION

A motion prediction cache (MPC) enables the use of reference image pixeldata, (i.e., data which is stored in reference macroblocks), to buildother macroblocks. As aforementioned, MPC may be expensive as comparedto other types of memories. Preferably, the size of the MPC issufficient for storage of at least one reference macroblock ofprediction pixels, which will enable the MPC to rapidly accommodate alldata requests for a current reference macroblock. The size of an MPC maybe determined by application specific criteria including various modesof operation and different tile configurations.

During the course of the video decoding process, the MPC would typicallyexperience idle periods. A method to reduce the latency by speculativelyrequesting data during the idle periods is hereby introduced to takeadvantage of those idle periods.

FIG. 1 shows a system 100 for implementing speculative motionprediction. The system 100 includes a bitstream entropy decoder 210, amotion predictor (MP) 220, a motion prediction cache (MPC) 230, a memoryblock 240, and a speculative motion vectors (SMV) block 250. Thebitstream entropy decoder 210 receives a video bitstream and convertsthe received bitstream into intermediate symbols, quantizedcoefficients, and motion vector information, (e.g. motion vectors,reference image identifier and macroblock identifier).

The MP 220 receives the motion vector information from the bitstreamentropy decoder 210 and transmits a request for reference image data tothe MPC 230. The MP 220 then receives the reference image data andoutputs a predicted macroblock.

The MPC 230 receives requests from the MP 220 and the SMV block 250 forreference image data for a macroblock, sends data requests to the memoryblock 240, and receives and stores the requested reference image datafor use in building subsequent macroblocks. The MPC 230 also isconfigured to output a state identifier, which may notify other blocksof its present state for scheduling purposes.

The SMV block 250 receives motion vector information from the bitstreamentropy decoder 210, as well as the state identifier from the MPC 230,and outputs data requests for reference image data from the memory block240 when it determines the MPC 230 has entered, or will enter, on idleperiod.

The memory block 240 receives memory requests, retrieves the requestedreference image data and outputs the requested reference image data.

The aforementioned components of the system 100 will now be discussed ingreater detail hereinafter. The bitstream entropy decoder 210 receives avideo bitstream and converts it into intermediate symbols, quantizedcoefficients, and motion vector information, (e.g. motion vectors,reference image identifier, and macroblock identifier.) The particularimplementation of the bitstream entropy decoder 210 may vary dependingon the application in which it is used. For example, a context-basedadaptive binary arithmetic coding (CABAC) bitstream entropy decodergenerates intermediate symbols, quantized coefficients, and motionvector information produced by transform-decoding. CABAC decoders may beimplemented in an integrated circuit or software for execution ongeneral purpose processors.

The MP 220 predicts motion by using motion vector information of acurrent macroblock and a reference frame that is typically stored inmemory, (in this case the memory block 240). The MP 220 receives amultiplexed data signal including intermediate symbols, quantizedcoefficients, and motion vector information. An MP 220 may comprise ademultiplexer receiving multiplexed intermediate symbols, quantizedcoefficients, and motion vector information and outputting demultiplexedmotion vector information components, (i.e. include motion vectors,reference image identifiers, and macroblock identifiers). The MP 220uses the motion vector information components to calculate a memoryaddress. The calculated memory address may contain a cache address, areference frame number, a macroblock number, or any other type ofidentification of the location or address of reference image data storedin the memory block 240.

Once the memory address is calculated, a request is generated for thereference image data stored at the particular memory address in thememory block 240. The MP 220 then transmits the request and waits forthe requested reference image data to return. When the MP 220 receivesthe reference image data, it filters the reference image data andreconstructs a macroblock. The reconstructed macroblock is then outputto the system for further decoding. The MP 220 output represents a blockposition on the basis of a predictive error with respect to anappropriate range for the reference image corresponding to the motionprediction reference image.

Referring to FIG. 3, the MPC 230 of system 100 is shown in greaterdetail. The MPC 230 enables the MP 220 to use the reference image datato build a macroblock. The MPC 230 may include a control module 331, amotion prediction cache buffer 332 having a tag memory 333 and a datacache memory 334, an external data request module 335, a request queue336 and a state machine 337. The MPC 230 will respond to a request forreference image data for a current macroblock by requesting the datafrom the memory block 240 shown in FIG. 1. The request may be receivedfrom the MP 220 or the SMV block 250. Optionally, the MPC 230 may befurther configured to include a priority for requesting reference imagedata, wherein the MP 220 requests would have a higher priority, andconsequently, a request from the MP 220 would terminate any pendingrequests from the SMV block 250.

In operation, a request including the calculated address is received atthe control module 331. The control module 331 provides overall controlof the MPC 230. The data cache memory 334 stores any reference imagedata that is retrieved from the memory block to 240. The tag memory 333stores a “tag” or listing of the reference image data blocks that arestored in the data cache memory 334. Any requests for external data fromthe memory block 240 are handled by the external data request module335. These requests are also placed into the request queue 336 andmonitored by the state machine 337. If the state machine 337 observesthat a request including a particular address for reference image datawas previously requested and is stored in the data cache memory 334, thestate machine 337 enables the data cache memory 334 to forward thepreviously retrieved data to the control module 331. This process willbe described in greater detail hereinafter. The control module 331examines the request and searches the tag memory 333 to determine if theaddress associated with the request is stored in the tag memory 333. Ifso, it means that the associated reference image data is stored in thedata cache memory 334. If the search of the tag memory is unsuccessful,meaning the requested reference image data is not stored in the MPC 230,a request to the memory block 240 is made by the external data requestmodule 335.

The tag memory 333 is written with at least some of the parameters inthe request. If the search was successful, meaning the requestedreference image data is already stored in the data cache memory 334, thedata is read from the data cache memory 334 to the control module 331and then to the MP 220.

Regardless of whether or not the search of the tag memory 333 wassuccessful, the search parameters are written to the request queue 336and, if the request queue 336 is not full, the next request received isserviced. When the MPC 230 has completed requesting data for a currentdata block, which may be indicated by an empty request queue 336, thestate machine 337 transmits a signal indicating that it has entered intoan idle state. If the MPC 230 receives a request from the SMV block 250during the idle state, the MPC 230 can speculatively request referenceimage data from the memory block 240. This request is processed in thesame manner as the request from the MP 220. The requested referenceimage data will be stored in the data cache memory 334 and theassociated tag will also be stored in the tag memory 333 with the hopethat the reference image data will be needed for a subsequentmacroblock.

Referring to FIG. 2, the SMV block 250 of FIG. 1 is shown in greaterdetail. The SMV block 250 generates “speculative” data requests for theMPC 230, when the MP 220 to MPC 230 data flow is idle. A speculativedata request is not a request for reference image data that is needed,(as is the case with a request from the MP 220), rather it is a requestfor reference image data that might be needed.

Ideally, when the MP 220 is ready to request data for a subsequentmacroblock, the MPC 230 will have already requested the reference imagedata from the memory block 240 based on the requests that originate fromthe SMV block 250 during a period where the MPC 230 would otherwise bein an idle state.

The SMV block 250 includes a motion vector calculator 253, a register254, and a motion vector memory to address translator (MVMAT) 256. Themotion vector calculator 253 receives motion vector information, (e.g.macroblock information, motion vectors, mode, and reference imageinformation, and a frame start data field), from the bitstream entropydecoder 210. A register 254 is provided to store the trend of the motionvector direction based on the past motion vectors. A moving windowaverage of recent trends in the motion vector direction is calculatedbased on the received information, using extrapolation techniques. Thisaverage is used to generate a speculatively predicted motion vector fora subsequent macroblock. The speculatively predicted motion vector issent to the MVMAT 256, which translates the vector into a correspondingmemory address in the memory block 240. The MVMAT 256 then generates adata request for the data located at the selected memory address.

It should be understood that the SMV block 250 may employ a number ofmethods to speculate the motion vectors for a subsequent macroblock. Forexample, if a video decoder is operating in a horizontal scan mode,(e.g. the HD resolutions in H.264 and VC-1 standards), the speculativemotion vectors may be the same as the current vectors, or the average ofvectors from the left macroblock, the top-left macroblock and topmacroblock. Alternatively, the SMV block 250 could maintain a runningaverage of the motion per macroblock and extrapolate from the runningaverage. There are many other similar methods that may be implemented.The particular method may be chosen adaptively based on the decodingperformance, which is measurable.

Referring back to FIG. 1, the system 100 also includes a memory block240. As aforementioned, the memory block 240 receives requests forreference data from the MPC 230, and additionally from other parts ofthe system 100. The memory block 240 may include a memory controlinterface (MCI) 242 and a reference frame memory buffer 244. The MCI 242manages the control and timing of data transfers, bus arbitration, andmemory access. Additionally, the MCI 242 may provide a dedicated channelto transfer image data that may be stored in the memory block 240. Thememory block 240 stores reference image data for later retrieval uponrequest. In the present embodiment, the reference frame memory buffer244 comprises dynamic random access memory (DRAM). Alternatively, thereference frame memory buffer 244 may comprise an SRAM, virtual memory,flash memory, or any other accepted memory buffer.

The process of reducing the memory read latency of a motion predictionsystem is shown in FIG. 4. A video bitstream is received (step 410). Thevideo bitstream is used to calculate motion vectors which are then usedto calculate the addresses of reference image data for a currentmacroblock. The MP 220 requests reference image data corresponding tothe calculated reference frame addresses. The MPC 230 receives the datarequests, and requests the reference image data from the memory block240. After the requested reference image data has been received by theMPC 230 (step 420), a determination is made as to whether referenceimage data for a new macroblock is ready to be requested (step 425). Ifthe reference image data for the new macroblock is not ready to berequested, (i.e. the MP 220 has not calculated the addresses ofreference image data for a new macroblock), and the MPC 230 wouldotherwise enter an idle state, reference image data in the memory block240 is speculatively requested (step 430). The speculative data requestcan be for a full macroblock of data or less than a full macroblock. Theactual amount of data that is speculatively received is dynamicallyadjusted based on the decoding status of the present macroblock.Thereafter, the MP 220 is ready to request reference image data for anew macroblock (steps 435).

A determination is then made as to whether the reference image data thathas been speculatively requested is the reference image data thatcorresponds with the data that is requested by the MP 220 for thesubsequent image (step 440). If the data is needed, the data isimmediately consumed (step 460). However, if the speculatively requesteddata is not needed, then the correct reference image data must berequested (step 450) and the speculatively requested data must bepurged.

In the event the speculatively requested data is not used, it may bepreferable to purge the tag memory 333 of the unused speculative data ina more efficient manner. FIG. 5 shows a motion prediction cache buffer332 of the MPC 230 in greater detail. Each location in each memory has adatum, called a cache line, which may range in size in depending uponthe system design. Each location in each memory also has an index,(usually the lower order digits of the address), which is used toaddress a word in the cache. The remaining high-order bits in theaddress, called the tag, are stored in the motion prediction cachebuffer 332 along with the data. When a request is received at thecontrol module 331, the control module 331 examines the request andsearches the tag memory 333 associated with the request. In the eventthat the control module 331 searches the tag memory 333 and determinesthat the reference image data is not found in the motion predictioncache buffer 332, the search will be deemed unsuccessful. In the eventthe cache tags searched were speculatively requested, they will then bepurged to make room for additional memory requests. In order to furtherstreamline the above process, the cache tags may be modified to includean “s-flag”, as shown in FIG. 6, so the cache lines that werespeculatively requested, if not used in the current macroblock could beeasily identified by the 5-flag and, thus, purged first.

The potential advantages offered as compared to known art includeenabling the Blu-ray, HD-DVD support of high-definition video on low-endchips. Additionally it could enable dual HD video support.

Although the features and elements are described in particularcombinations, each feature or element can be used alone without theother features and elements or in various combinations with or withoutother features. The methods or flow charts provided may be implementedin a computer program, software, or firmware tangibly embodied in acomputer-readable storage medium for execution by a general purposecomputer or a processor. Examples of computer-readable storage mediumsinclude a read only memory (ROM), a random access memory (RAM), aregister, cache memory, semiconductor memory devices, magnetic mediasuch as internal hard disks and removable disks, magneto-optical media,and optical media such as CD-ROM disks, and digital versatile disks(DVDs).

Suitable processors include, by way of example, a general purposeprocessor, a special purpose processor, a conventional processor, adigital signal processor (DSP), a plurality of microprocessors, one ormore microprocessors in association with a DSP core, a controller, amicrocontroller, Application Specific Integrated Circuits (ASICs), FieldProgrammable Gate Arrays (FPGAs) circuits, any other type of integratedcircuit (IC), and/or a state machine.

1. A method for a motion prediction implemented in video compression,the method comprising: receiving motion vector information for a firstimage; generating a speculative motion vector from the received motionvector information; obtaining reference frame data based on thespeculative motion vector; receiving motion vector information for asubsequent image; determining whether said obtained reference frame datacorresponds to said motion vector information for said subsequent imageand, if so, using said obtained reference frame data to construct atleast a portion of said subsequent image.
 2. The method of claim 1further comprising: transmitting a state identifier by a motionprediction cache (MPC), notifying the other blocks of its present state;and scheduling the request for reference frame data using the stateidentifier
 3. The method of claim 2, wherein the scheduling the requestfor the reference frame data occurs when the state identifier indicatesan idle state.
 4. The method as in claim 3 further comprising: the MPCreceiving a request for the reference frame data; and requesting thereference frame data from a system memory.
 5. The method of claim 1,further comprising: receiving speculative reference frame data;determining that the reference frame data is needed for a macroblock;and storing the reference frame data.
 6. The method of claim 1 furthercomprising: requesting a macroblock of reference frame data.
 7. Themethod of claim 1, wherein the motion vector information comprises acurrent macroblock identifier, at least one motion vector, and at leastone reference frame identifier.
 8. The method of claim 1, comprising:selecting a plurality of speculative motion vectors adaptively to ameasured decoding performance.
 9. The system of claim 1, furthercomprising: modifying a plurality of cache tags to include a flagidentifying data as a plurality of speculative cache data.
 10. Themethod as in claim 8, further comprising: removing a plurality ofspeculative cache data lines, associated with the speculative cachedata, when the speculative cache data are not used in a currentmacroblock.
 11. The method of claim 1, further comprising: determining aplurality of trends in a motion vector direction; generating a movingwindow average of the plurality of trends in the motion vectordirection; and generating a plurality of speculative motion vectorsbased on the moving window average.
 12. The method of claim 1 wherein,the motion vector information is for a portion of a frame.
 13. Themethod of claim 12 wherein the portion of the frame is a macroblock. 14.A system for a motion prediction implemented in video compression, thesystem comprising: a bitstream entropy decoder configured to decode areceived bitstream; a motion predictor configured to predict motion of abitstream a motion prediction cache (MPC) configured to request andstore reference frame data; a speculative motion vectors block (SMV)configured to speculatively request reference frame data.
 15. The systemof claim 14 further comprising: the MPC configured to transmit a stateidentifier.
 16. The system of claim 15, further comprising: the SMVconfigured to receive the state identifier, and schedule a request forreference frame data when the state identifier indicates an idle state.17. The system of claim 14 further comprising: the MPC configured toreceive a request for the reference frame data, and request thereference frame data from a system memory.
 18. The system of claim 14,further comprising: the SMV configured to receive motion vectorinformation, and to generate a speculative motion vector from thereceived motion vector information
 19. The system of claim 14 furthercomprising: the SMV configured to generate a request for reference framedata based on the speculative motion vector, and to transmit the requestfor speculative reference frame data.
 20. The system of claim 14,further comprising: the MPC further configured to receive thespeculative reference frame data and to determine whether the referenceframe data is needed for a macroblock; and if so, to store the referenceframe data.
 21. The system of claim 14 further comprising: the SMVconfigured to request a macroblock of reference frame data.
 22. Thesystem of claim 14, wherein the motion vector information comprises acurrent macroblock identifier, at least one motion vector, and at leastone reference frame identifier.
 23. The system of claim 14, comprising:the SMV configured to select the speculative motion vectors adaptivelyto a measured decoding performance.
 24. The system as in claim 14,further comprising: the MPC configured to include a flag on a pluralityof cache tags to identify data as speculative cache data.
 25. The systemof claim 24, further comprising: the MPC configured to remove aplurality of speculative cache data lines associated with thespeculative cache data first, when the speculative cache data are notused in a current macroblock.
 26. The system of claim 14, furthercomprising: the SMV configured to determine a plurality of trends in amotion vector direction, to generate a moving window average of theplurality of trends in the motion vector direction, and to generate thespeculative motion vectors based on the moving window average.